Files
llm-graph-builder/example.env
aashipandya 8ea4bb120d Staging to main (#710)
* Dev (#537)

* format fixes and graph schema indication fix

* Update README.md

* added chat modes variable in env updated the readme

* spell fix

* added the chat mode in env table

* added the logos

* fixed the overflow issues

* removed the extra fix

* Fixed specific scenario  "when the text from schema closes it should reopen the previous modal"

* readme changes

* removed dev console logs

* added new retrieval query (#533)

* format fixes and tab rendering fix

* fixed the setting modal reopen issue

---------

Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* disabled the sumbit buttom on loading

* Deduplication tab (#566)

* de-duplication API

* Update De-Duplicate query

* created the Deduplication tab

* added the API service

* added the removeable tags for similar nodes in deduplication tab

* Integrate Tag

* added GraphLabel

* added loader state

* added the merge service

* integrated the merge API

* Merge Query issue fixed

* Auto refresh the duplicate nodes after merging operation

* added the description for de duplication

* reset on merging

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Update frontend_docs.adoc (#538)

* Update frontend_docs.adoc

* doc update

* Images

* Images folder change

* Images folder change

* test image

* Update frontend_docs.adoc

* image change

* Update frontend_docs.adoc

* Update frontend_docs.adoc

* added the Graph Mode SS

* added the Query SS

* Update frontend_docs.adoc

* conflics fix

* conflict fix

* Update frontend_docs.adoc

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* updated langchain versions (#565)

* Update the De-Duplication query

* Node relationship id type none issue (#547)

* de-duplication API

* Update De-Duplicate query

* Issue fixed Nodes,Relationship Id and Type None or Blank

* added the tooltips

* type fix

* Unneccory import

* added score threshold and added some error handling (#571)

* Update requirements.txt

* Tooltip and other UI fixes (#572)

* Staging To Main (#495)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* Dev (#433)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* DEV to STAGING (#461)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* DEV to STAGING (#462)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* added upload api

* changed the dropzone error message

* Dev to staging (#466)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* added upload api

* changed the dropzone error message

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* format fixes

* Close connect when graph object is not none

* Call garbage collector to release the menory

* Change error message

* Added driver config as user_agent

* Updated doc for the LLM_MODELS and GCS_FILE_CACHE (#473)

* Web URLs are user input (#475)

* web url support backend

* added the tabs for input source

* user agent added for Neo4jGraph connection

* Tab view for sources

* extract handling for web ur's

* initial input handling

* chunk creation before processing

* code structure

* format fixes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* changed the regex for web and cancel button naming

* changed the schema dropdown type

* readme updates

* PROD version fix

* changed the alert message for gcs

* Delete unconnected entities from DB (#482)

* 457 add schema before generate graph (#478)

* schema setting from generate graph

* changes

* changes

* badge changes

* bug fix

* Fulltext index and Update similarity graph (#479)

* added full_text index

* added one common function for post_processing

* post processing api

* added tasks param

* modifed logging

* post processing changes

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Graph and vector search (#485)

* Modified the retrival query

* added the chatmode toggle component

* Modified to vector search

* Moved the templates to constants

* added the icons

* added chat modes

* code structure changes

* Intergrated the API changges

* Modified retrieval queries,refactored code

* API integration changes

* added the score

* order change

* wording change

* modified constants

* added graph+vector

* added the tooltips

* Modified query

* removed the graph mode

* tooltip camel Case

* added the icon and extern link for web source in the info modal

* added the youtube link in the source used tab

* format fixes

* added the hoverable link

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Update InfoModal.tsx

* removed hover from chunks

* removed page number

…

* Graph visualization removal of dropdown & Schema popup (#575)

* Staging To Main (#495)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* Dev (#433)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* DEV to STAGING (#461)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* DEV to STAGING (#462)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* added upload api

* changed the dropzone error message

* Dev to staging (#466)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* added upload api

* changed the dropzone error message

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* format fixes

* Close connect when graph object is not none

* Call garbage collector to release the menory

* Change error message

* Added driver config as user_agent

* Updated doc for the LLM_MODELS and GCS_FILE_CACHE (#473)

* Web URLs are user input (#475)

* web url support backend

* added the tabs for input source

* user agent added for Neo4jGraph connection

* Tab view for sources

* extract handling for web ur's

* initial input handling

* chunk creation before processing

* code structure

* format fixes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* changed the regex for web and cancel button naming

* changed the schema dropdown type

* readme updates

* PROD version fix

* changed the alert message for gcs

* Delete unconnected entities from DB (#482)

* 457 add schema before generate graph (#478)

* schema setting from generate graph

* changes

* changes

* badge changes

* bug fix

* Fulltext index and Update similarity graph (#479)

* added full_text index

* added one common function for post_processing

* post processing api

* added tasks param

* modifed logging

* post processing changes

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Graph and vector search (#485)

* Modified the retrival query

* added the chatmode toggle component

* Modified to vector search

* Moved the templates to constants

* added the icons

* added chat modes

* code structure changes

* Intergrated the API changges

* Modified retrieval queries,refactored code

* API integration changes

* added the score

* order change

* wording change

* modified constants

* added graph+vector

* added the tooltips

* Modified query

* removed the graph mode

* tooltip camel Case

* added the icon and extern link for web source in the info modal

* added the youtube link in the source used tab

* format fixes

* added the hoverable link

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Update InfoModal.tsx

* removed hover from chunks…

* connection creation in extract and CancelledError handling for sse (#584)

* Update the de-duplication nodes list query

* Format fixes

* accessbility fixes

* added the name for checkbox

* reset the loading state on API failure

* openai llm as default (#588)

* resetting the duplicate nodes state when there is no data returned from the API

* New Graph query changes  (#586)

* Staging To Main (#495)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* Dev (#433)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* DEV to STAGING (#461)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* DEV to STAGING (#462)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* added upload api

* changed the dropzone error message

* Dev to staging (#466)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* added upload api

* changed the dropzone error message

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* format fixes

* Close connect when graph object is not none

* Call garbage collector to release the menory

* Change error message

* Added driver config as user_agent

* Updated doc for the LLM_MODELS and GCS_FILE_CACHE (#473)

* Web URLs are user input (#475)

* web url support backend

* added the tabs for input source

* user agent added for Neo4jGraph connection

* Tab view for sources

* extract handling for web ur's

* initial input handling

* chunk creation before processing

* code structure

* format fixes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* changed the regex for web and cancel button naming

* changed the schema dropdown type

* readme updates

* PROD version fix

* changed the alert message for gcs

* Delete unconnected entities from DB (#482)

* 457 add schema before generate graph (#478)

* schema setting from generate graph

* changes

* changes

* badge changes

* bug fix

* Fulltext index and Update similarity graph (#479)

* added full_text index

* added one common function for post_processing

* post processing api

* added tasks param

* modifed logging

* post processing changes

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Graph and vector search (#485)

* Modified the retrival query

* added the chatmode toggle component

* Modified to vector search

* Moved the templates to constants

* added the icons

* added chat modes

* code structure changes

* Intergrated the API changges

* Modified retrieval queries,refactored code

* API integration changes

* added the score

* order change

* wording change

* modified constants

* added graph+vector

* added the tooltips

* Modified query

* removed the graph mode

* tooltip camel Case

* added the icon and extern link for web source in the info modal

* added the youtube link in the source used tab

* format fixes

* added the hoverable link

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Update InfoModal.tsx

* removed hover from chunks

* removed page number

*…

* Update the duplicate nodes query

* updated graph query (#590)

* Added 2 API endpoint to get the vector dimesion and drop_recreate vector index with correct dimesions

* Merge get_vector_dimension API with /connect API

* Drop index only when it's exist

* GPT 4o mini integration (#592)

* openai llm as default

* added gpt-4o mini as llm

* added the openai-gpt-4o-mini

* Updated model ids

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* 549 graph visualization removal of dropdown (#601)

* Staging To Main (#495)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* Dev (#433)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* DEV to STAGING (#461)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* DEV to STAGING (#462)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* added upload api

* changed the dropzone error message

* Dev to staging (#466)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* added upload api

* changed the dropzone error message

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* format fixes

* Close connect when graph object is not none

* Call garbage collector to release the menory

* Change error message

* Added driver config as user_agent

* Updated doc for the LLM_MODELS and GCS_FILE_CACHE (#473)

* Web URLs are user input (#475)

* web url support backend

* added the tabs for input source

* user agent added for Neo4jGraph connection

* Tab view for sources

* extract handling for web ur's

* initial input handling

* chunk creation before processing

* code structure

* format fixes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* changed the regex for web and cancel button naming

* changed the schema dropdown type

* readme updates

* PROD version fix

* changed the alert message for gcs

* Delete unconnected entities from DB (#482)

* 457 add schema before generate graph (#478)

* schema setting from generate graph

* changes

* changes

* badge changes

* bug fix

* Fulltext index and Update similarity graph (#479)

* added full_text index

* added one common function for post_processing

* post processing api

* added tasks param

* modifed logging

* post processing changes

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Graph and vector search (#485)

* Modified the retrival query

* added the chatmode toggle component

* Modified to vector search

* Moved the templates to constants

* added the icons

* added chat modes

* code structure changes

* Intergrated the API changges

* Modified retrieval queries,refactored code

* API integration changes

* added the score

* order change

* wording change

* modified constants

* added graph+vector

* added the tooltips

* Modified query

* removed the graph mode

* tooltip camel Case

* added the icon and extern link for web source in the info modal

* added the youtube link in the source used tab

* format fixes

* added the hoverable link

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Update InfoModal.tsx

* removed hover from chunks

* remov…

* Update GenericSourceModal.tsx

* Data table filtering (#589)

* added the dropdown

* added the filters for status column

* Changed the options name

* removed unused variable

* moved the checkbox into UI

* added the filters for file type and llm column

* autoresetting the page when filtering is applied

* removed dev console.log

* Remove connection close from final bloack and make DB parameter required

* Perfromance test

* Modified Integration test

* Add nltk package through code and pypandoc. Remove page_number

* updated script

* Vector dimension reset (#594)

* added the vector index API

* integrated the create vector index API

* added the markdown for alert message

* Moved the alert into the connection Modal

* fixed the new vector index param

* added the vector index dimension to text

* Update VectorIndexMisMatchAlert.tsx

* moved types to types files

* removed hardcoded values

* added the selected state indicator for  filter types

* added missed dependency

* increased the table hieght

* Hybrid search (#611)

* added fulltext index for hybrid search

* added hybrid search mode

* modified query

* modified the neo4j retriver

* modified script for wiki page

* fixed the vector index param on submit

* format fixes

* type fix

* fixed orphan delete loading state

* Fix the issue "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 2152: invalid start byte"

* gpt-4o-mini default

* Issue fixed in extarct API "UnboundLocalError: local variable 'graphDb_data_Access' referenced before assignment
"

* Youtube timestamps (#612)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

* initial changes for youtube timestamps

* updated way of getting youtube transcript

* corrected query

* contants updated

* time for chunk in chatbot details

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* added the lazy loading for Dialogs

* moved into utils

* removed data uploading through axios in dropzone used service API

* lazy loading fallback UI

* Update requirements.txt

* Add file Cypher_queries

* updated cypher_queries file

* fixed json parse issue

* Added Preload to prevent to load LLM model SentenceTransform and using env_file in docker-compose yaml to pulling variables directlt from file

* Rollback env_file attribute from compose.yml

* updated Cypher Queries File

* updated Cypher queries file

* added fireworks new model

* UI for post processing in graph enhancements with a checkbox list (#627)

* added the tab

* added the  description for post processing jobs

* integrated postprocessing state

* tablet mode responsive ness

* DO NOT MERGE - Document, chunk node labels and relation labels updated with underscore (#626)

* Document and Chunk node label updated

* added underscores for all relationship types

* updated queries from DEV

* updated Document and Chunk nodes after DEV merge

* Label Changes

* Exclude labels from schema API which label start from __

---------

Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Application responsiveness desktop laptop tablet (#624)

* added useMedia query hook

* add media query

* added the datasources in sidenav for smaller files

* added the dropzone

* removed the unused variables

* fixed the json.parse bug

* style changes accoroding the viewport

* added the tooltip

* decreased the image in tablet modal

* format fixes

* changed the variable

* added the buttons

* decreased the image size and text content

* drawer fix

* type change

* removed blank sidenavigation item

---------

Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>

* format fixes

* code improvement

* code improvement for boolean state

* icon rendering fix

* Ignore relationship types label start with __

* schema check

* Update README.md

* Droped the old vector index (#652)

* added cypher_queries and llm chatbot files

* updated llm-chatbot-python

* added llm-chatbot-python

* updated llm-chatbot-python folder

* fixed loader issue for lazy loading in chat info dialog

* Added chatbot "hybrid " mode use case

* __ changes (#656)

* DiffbotGraphTransformer doesn't need an LLMGraphTransformer (#659)

Co-authored-by: jeromechoo <hello@jeromechoo.com>

* Removed experiments/llm-chatbot-python folder from DEV branch

* Removed experiments/Cypher_Queries.ipynb file from DEV branch

* redcued the password clear timeout

* disabled the closed button on banner and connection dialog while API is in pending state

* update delete query with entities

* node id check (#663)

* Status source and type filtering  (#664)

* status source

* Name change

* type change

* added Hybrid Chat modes (#670)

* Rename the function #657

* label and checkboxes placement changes (#675)

* label and checkboxes placement changes

* checkbox placement changes

* Graph node filename check

* env fixes with latest nvl libraries

* format fixes

* Remove TotalPages when save file on local (#684)

* file_name reference and verify_ssl issue fixed (#683)

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Concurrent processing of files (#665)

* Update README.md

* Droped the old vector index (#652)

* added cypher_queries and llm chatbot files

* updated llm-chatbot-python

* added llm-chatbot-python

* updated llm-chatbot-python folder

* Added chatbot "hybrid " mode use case

* added the concurrent file processing

* page refresh scenario

* fixed waiting files processing issue in refresh scenario

* removed boolean param

* fixed processedCount issue

* checkbox with waiting check

* fixed the refresh scenario with processing files

* processing files check

* server side error

* processing file count check for processing files less than batch size

* processing count check to handle allselected files

* created helper functions

* code improvements

* __ changes (#656)

* DiffbotGraphTransformer doesn't need an LLMGraphTransformer (#659)

Co-authored-by: jeromechoo <hello@jeromechoo.com>

* Removed experiments/llm-chatbot-python folder from DEV branch

* redcued the password clear timeout

* Removed experiments/Cypher_Queries.ipynb file from DEV branch

* disabled the closed button on banner and connection dialog while API is in pending state

* update delete query with entities

* node id check (#663)

* Status source and type filtering  (#664)

* status source

* Name change

* type change

* rollback to previous working nvl version

* added the alert

* add BATCH_SIZE to docker

* temp fixes for 0.3.1

* alert fix for less than batch size processing

* new virtual env

* added Hybrid Chat modes (#670)

* Rename the function #657

* label and checkboxes placement changes (#675)

* label and checkboxes placement changes

* checkbox placement changes

* Graph node filename check

* env fixes with latest nvl libraries

* format fixes

* removed local files

* Remove TotalPages when save file on local (#684)

* file_name reference and verify_ssl issue fixed (#683)

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* ndl changes

* label and checkboxes placement changes (#675)

* label and checkboxes placement changes

* checkbox placement changes

* env fixes with latest nvl libraries

* format fixes

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* env fixes with latest nvl libraries

* format fixes

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Status source and type filtering  (#664)

* status source

* Name change

* type change

* added the alert

* temp fixes for 0.3.1

* label and checkboxes placement changes (#675)

* label and checkboxes placement changes

* checkbox placement changes

* env fixes with latest nvl libraries

* format fixes

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* ndl changes

* env fixes with latest nvl libraries

* format fixes

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* env fixes with latest nvl libraries

* format fixes

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* added cypher_queries and llm chatbot files

* updated llm-chatbot-python

* added llm-chatbot-python

* updated llm-chatbot-python folder

* page refresh scenario

* fixed waiting files processing issue in refresh scenario

* Removed experiments/llm-chatbot-python folder from DEV branch

* disabled the closed button on banner and connection dialog while API is in pending state

* node id check (#663)

* Status source and type filtering  (#664)

* status source

* Name change

* type change

* rollback to previous working nvl version

* added the alert

* temp fixes for 0.3.1

* label and checkboxes placement changes (#675)

* label and checkboxes placement changes

* checkbox placement changes

* env fixes with latest nvl libraries

* format fixes

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* ndl changes

* env fixes with latest nvl libraries

* format fixes

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* env fixes with latest nvl libraries

* format fixes

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Status source and type filtering  (#664)

* status source

* Name change

* type change

* added the alert

* temp fixes for 0.3.1

* label and checkboxes placement changes (#675)

* label and checkboxes placement changes

* checkbox placement changes

* env fixes with latest nvl libraries

* format fixes

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* ndl changes

* env fixes with latest nvl libraries

* format fixes

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* env fixes with latest nvl libraries

* format fixes

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* property spell fix

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Jayanth T <jayanth_t@persistent.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Jerome Choo <mail@jeromechoo.com>
Co-authored-by: jeromechoo <hello@jeromechoo.com>
Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* format fixes

* fixed the row selection issue

* clearing the queue when there are no files in the db

* Update Dockerfile (#694)

* env changes for VITE (#690)

* removed the processing count update on error event of server side event because server side event will again run untill it is cancelled from client side

* removed hardcode value

* removed hardcoded values for resetting the processing count

* function definition changes

* vite prefix

* Update docker-compose.yml (#688)

* Remove TotalPages when save file on local (#684)

* file_name reference and verify_ssl issue fixed (#683)

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Remove TotalPages when save file on local (#684)

* file_name reference and verify_ssl issue fixed (#683)

* User flow changes for recreating supported vector index (#682)

* removed the if check

* Add one more check for create vector index when chunks are exist without embeddings

* removed local files

* condition changes

* chunks exists check

* chunk exists without embeddings check

* vector Index issue fixed

* vector index with different dimension

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Reapply "Dockerfile changes with VITE label"

This reverts commit a83e0855fb.

* Revert "Dockerfile changes with VITE label"

This reverts commit 2840ebc9e6.

* Update docker-compose.yml

* Update graphDB_dataAccess.py

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>

* enabled the entity extraction by default

* Fix typo: correct 'josn_obj' to 'json_obj' (#697)

* Staging To Main (#495)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* Dev (#433)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* DEV to STAGING (#461)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* DEV to STAGING (#462)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* added upload api

* changed the dropzone error message

* Dev to staging (#466)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* added upload api

* changed the dropzone error message

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* format fixes

* Close connect when graph object is not none

* Call garbage collector to release the menory

* Change error message

* Added driver config as user_agent

* Updated doc for the LLM_MODELS and GCS_FILE_CACHE (#473)

* Web URLs are user input (#475)

* web url support backend

* added the tabs for input source

* user agent added for Neo4jGraph connection

* Tab view for sources

* extract handling for web ur's

* initial input handling

* chunk creation before processing

* code structure

* format fixes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* changed the regex for web and cancel button naming

* changed the schema dropdown type

* readme updates

* PROD version fix

* changed the alert message for gcs

* Delete unconnected entities from DB (#482)

* 457 add schema before generate graph (#478)

* schema setting from generate graph

* changes

* changes

* badge changes

* bug fix

* Fulltext index and Update similarity graph (#479)

* added full_text index

* added one common function for post_processing

* post processing api

* added tasks param

* modifed logging

* post processing changes

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Graph and vector search (#485)

* Modified the retrival query

* added the chatmode toggle component

* Modified to vector search

* Moved the templates to constants

* added the icons

* added chat modes

* code structure changes

* Intergrated the API changges

* Modified retrieval queries,refactored code

* API integration changes

* added the score

* order change

* wording change

* modified constants

* added graph+vector

* added the tooltips

* Modified query

* removed the graph mode

* tooltip camel Case

* added the icon and extern link for web source in the info modal

* added the youtube link in the source used tab

* format fixes

* added the hoverable link

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Update InfoModal.tsx

* removed hover from chunks

* remove…

* fixed model rendering fix for waiting files

* lint fixes

* Fix typo: correct 'josn_obj' to 'json_obj' (#697)

* Staging To Main (#495)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* Dev (#433)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* DEV to STAGING (#461)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* DEV to STAGING (#462)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* added upload api

* changed the dropzone error message

* Dev to staging (#466)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* added upload api

* changed the dropzone error message

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* format fixes

* Close connect when graph object is not none

* Call garbage collector to release the menory

* Change error message

* Added driver config as user_agent

* Updated doc for the LLM_MODELS and GCS_FILE_CACHE (#473)

* Web URLs are user input (#475)

* web url support backend

* added the tabs for input source

* user agent added for Neo4jGraph connection

* Tab view for sources

* extract handling for web ur's

* initial input handling

* chunk creation before processing

* code structure

* format fixes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* changed the regex for web and cancel button naming

* changed the schema dropdown type

* readme updates

* PROD version fix

* changed the alert message for gcs

* Delete unconnected entities from DB (#482)

* 457 add schema before generate graph (#478)

* schema setting from generate graph

* changes

* changes

* badge changes

* bug fix

* Fulltext index and Update similarity graph (#479)

* added full_text index

* added one common function for post_processing

* post processing api

* added tasks param

* modifed logging

* post processing changes

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Graph and vector search (#485)

* Modified the retrival query

* added the chatmode toggle component

* Modified to vector search

* Moved the templates to constants

* added the icons

* added chat modes

* code structure changes

* Intergrated the API changges

* Modified retrieval queries,refactored code

* API integration changes

* added the score

* order change

* wording change

* modified constants

* added graph+vector

* added the tooltips

* Modified query

* removed the graph mode

* tooltip camel Case

* added the icon and extern link for web source in the info modal

* added the youtube link in the source used tab

* format fixes

* added the hoverable link

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Update InfoModal.tsx

* removed hover from chunks

* remove…

* lint fixes

* lint fixes

* connection _check

* Dev (#701)

* lint fixes

* connection _check

---------

Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>

* DEV to STAGING (#703)

* Chatbot changes (#700)

* added Hybrid search from graph

* modified mode params

* fixed issue delete entities return count

* removed specified version due to dependency clashes between versions

* updated script"integration test cases"

* decreased the delay for pollintg API

* Graph enhancements (#696)

* relationship Changes

* addition of relationship labels

* onclick to nodes

* node-highlight

* Build fixex

* slash docker change

* deactivating previous node/relationshsips

* lint fixes

* class issue

* search

* search on basis of id / captions

* debounce changes

* class changes (#693)

* legends highlight

* search query reset

* node size

* changed chat mode names (#702)

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>
Co-authored-by: Pravesh1988 <pravesh_kumar@persistent.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>

* Dev (#705)

* connection _check

* Fix typo: correct 'josn_obj' to 'json_obj' (#697)

* Staging To Main (#495)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* Dev (#433)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* DEV to STAGING (#461)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* DEV to STAGING (#462)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* added upload api

* changed the dropzone error message

* Dev to staging (#466)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* added upload api

* changed the dropzone error message

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* format fixes

* Close connect when graph object is not none

* Call garbage collector to release the menory

* Change error message

* Added driver config as user_agent

* Updated doc for the LLM_MODELS and GCS_FILE_CACHE (#473)

* Web URLs are user input (#475)

* web url support backend

* added the tabs for input source

* user agent added for Neo4jGraph connection

* Tab view for sources

* extract handling for web ur's

* initial input handling

* chunk creation before processing

* code structure

* format fixes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* changed the regex for web and cancel button naming

* changed the schema dropdown type

* readme updates

* PROD version fix

* changed the alert message for gcs

* Delete unconnected entities from DB (#482)

* 457 add schema before generate graph (#478)

* schema setting from generate graph

* changes

* changes

* badge changes

* bug fix

* Fulltext index and Update similarity graph (#479)

* added full_text index

* added one common function for post_processing

* post processing api

* added tasks param

* modifed logging

* post processing changes

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Graph and vector search (#485)

* Modified the retrival query

* added the chatmode toggle component

* Modified to vector search

* Moved the templates to constants

* added the icons

* added chat modes

* code structure changes

* Intergrated the API changges

* Modified retrieval queries,refactored code

* API integration changes

* added the score

* order change

* wording change

* modified constants

* added graph+vector

* added the tooltips

* Modified query

* removed the graph mode

* tooltip camel Case

* added the icon and extern link for web source in the info modal

* added the youtube link in the source used tab

* format fixes

* added the hoverable link

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Update InfoModal.tsx

…

* default modes in staging

* processing count update fix on cancel

* Dev to Staging (#709)

* connection _check

* Fix typo: correct 'josn_obj' to 'json_obj' (#697)

* Staging To Main (#495)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* Dev (#433)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* DEV to STAGING (#461)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* DEV to STAGING (#462)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* added upload api

* changed the dropzone error message

* Dev to staging (#466)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* added upload api

* changed the dropzone error message

---------

Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>

* format fixes

* Close connect when graph object is not none

* Call garbage collector to release the menory

* Change error message

* Added driver config as user_agent

* Updated doc for the LLM_MODELS and GCS_FILE_CACHE (#473)

* Web URLs are user input (#475)

* web url support backend

* added the tabs for input source

* user agent added for Neo4jGraph connection

* Tab view for sources

* extract handling for web ur's

* initial input handling

* chunk creation before processing

* code structure

* format fixes

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>

* changed the regex for web and cancel button naming

* changed the schema dropdown type

* readme updates

* PROD version fix

* changed the alert message for gcs

* Delete unconnected entities from DB (#482)

* 457 add schema before generate graph (#478)

* schema setting from generate graph

* changes

* changes

* badge changes

* bug fix

* Fulltext index and Update similarity graph (#479)

* added full_text index

* added one common function for post_processing

* post processing api

* added tasks param

* modifed logging

* post processing changes

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Graph and vector search (#485)

* Modified the retrival query

* added the chatmode toggle component

* Modified to vector search

* Moved the templates to constants

* added the icons

* added chat modes

* code structure changes

* Intergrated the API changges

* Modified retrieval queries,refactored code

* API integration changes

* added the score

* order change

* wording change

* modified constants

* added graph+vector

* added the tooltips

* Modified query

* removed the graph mode

* tooltip camel Case

* added the icon and extern link for web source in the info modal

* added the youtube link in the source used tab

* format fixes

* added the hoverable link

---------

Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>

* Update InfoMo…

---------

Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com>
Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com>
Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com>
Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com>
Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>
Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com>
Co-authored-by: Ajay Meena <meenajy1996@gmail.com>
Co-authored-by: Morgan Senechal <morgan@neo4j.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Jayanth T <jayanthengineer07@gmail.com>
Co-authored-by: Jayanth T <jayanth_t@persistent.com>
Co-authored-by: Jerome Choo <mail@jeromechoo.com>
Co-authored-by: jeromechoo <hello@jeromechoo.com>
Co-authored-by: Michael Hunger <github@jexp.de>
Co-authored-by: Kain Shu <44948284+Kain-90@users.noreply.github.com>
Co-authored-by: destiny966113 <90891243+destiny966113@users.noreply.github.com>
Co-authored-by: Pravesh1988 <pravesh_kumar@persistent.com>
2024-08-27 17:52:59 +05:30

38 lines
1.2 KiB
Bash

# Mandatory
OPENAI_API_KEY = ""
DIFFBOT_API_KEY = ""
# Optional Backend
EMBEDDING_MODEL = "all-MiniLM-L6-v2"
IS_EMBEDDING = "true"
KNN_MIN_SCORE = "0.94"
# Enable Gemini (default is False) | Can be False or True
GEMINI_ENABLED = False
# LLM_MODEL_CONFIG_ollama_llama3="llama3,http://host.docker.internal:11434"
# Enable Google Cloud logs (default is False) | Can be False or True
GCP_LOG_METRICS_ENABLED = False
NUMBER_OF_CHUNKS_TO_COMBINE = 6
UPDATE_GRAPH_CHUNKS_PROCESSED = 20
NEO4J_URI = "neo4j://database:7687"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "password"
LANGCHAIN_API_KEY = ""
LANGCHAIN_PROJECT = ""
LANGCHAIN_TRACING_V2 = "true"
LANGCHAIN_ENDPOINT = "https://api.smith.langchain.com"
GCS_FILE_CACHE = False
ENTITY_EMBEDDING=True
# Optional Frontend
VITE_BACKEND_API_URL="http://localhost:8000"
VITE_BLOOM_URL="https://workspace-preview.neo4j.io/workspace/explore?connectURL={CONNECT_URL}&search=Show+me+a+graph&featureGenAISuggestions=true&featureGenAISuggestionsInternal=true"
VITE_REACT_APP_SOURCES="local,youtube,wiki,s3,web"
VITE_LLM_MODELS="diffbot,openai-gpt-3.5,openai-gpt-4o" # ",ollama_llama3"
VITE_ENV="DEV"
VITE_TIME_PER_PAGE=50
VITE_CHUNK_SIZE=5242880
VITE_GOOGLE_CLIENT_ID=""
VITE_CHAT_MODES=""
VITE_BATCH_SIZE=2