Skip to content

Commit 1d87310

Browse files
committed
Merge branch 'staging' into gen_ml_flag
2 parents f8c7f2b + 3171bb2 commit 1d87310

File tree

8 files changed

+96
-9
lines changed

8 files changed

+96
-9
lines changed

assets/contributions-agreement/signatures/cla.json

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5119,6 +5119,30 @@
51195119
"created_at": "2023-06-14T06:24:39Z",
51205120
"repoId": 143328315,
51215121
"pullRequestNo": 6587
5122+
},
5123+
{
5124+
"name": "akj2018",
5125+
"id": 43956935,
5126+
"comment_id": 1593813121,
5127+
"created_at": "2023-06-15T22:44:19Z",
5128+
"repoId": 143328315,
5129+
"pullRequestNo": 6596
5130+
},
5131+
{
5132+
"name": "truskovskiyk",
5133+
"id": 7893705,
5134+
"comment_id": 1594494194,
5135+
"created_at": "2023-06-16T10:55:59Z",
5136+
"repoId": 143328315,
5137+
"pullRequestNo": 6620
5138+
},
5139+
{
5140+
"name": "Aman123lug",
5141+
"id": 94223645,
5142+
"comment_id": 1594700571,
5143+
"created_at": "2023-06-16T13:44:38Z",
5144+
"repoId": 143328315,
5145+
"pullRequestNo": 6546
51225146
}
51235147
]
51245148
}

docs/data-integrations/datastax.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@ The required arguments to establish a connection are as follows:
1616
* `user` is the user to authenticate.
1717
* `password` is the password to authenticate the user.
1818
* `secure_connection_bundle` is the path to the `secure_connection_bundle` zip file.
19+
<Tip>
20+
If you installed MindsDB locally via pip, you need to install all handler dependencies manually. To do so, go to the handler's folder (mindsdb/integrations/handlers/datastax_handler) and run this command: `pip install -r requirements.txt`.
21+
</Tip>
22+
1923

2024
## Usage
2125

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
## Testing: Hugging Face - Toxicity Tutorial
2+
3+
Testing CREATE MODEL
4+
5+
```sql
6+
CREATE MODEL mindsdb.hf_toxicity
7+
PREDICT PRED
8+
USING
9+
engine = 'huggingface',
10+
task = 'text-classification',
11+
model_name = 'SkolkovoInstitute/roberta_toxicity_classifier',
12+
input_column = 'text';
13+
```
14+
15+
Output:
16+
17+
```sql
18+
| NAME | ENGINE | PROJECT | ACTIVE | VERSION | STATUS | ACCURACY | PREDICT | UPDATE_STATUS | MINDSDB_VERSION | ERROR | SELECT_DATA_QUERY | TRAINING_OPTIONS | TAG |
19+
| ---- | ------ | ------- | ------ | ------- | ------ | -------- | ------- | ------------- | --------------- | ----- | ----------------- | ---------------- | --- |
20+
| hf_toxicity | huggingface | mindsdb | true | 1 | generating | [NULL] | PRED | up_to_date | 23.6.3.1 | [NULL] | [NULL] | {'target': 'PRED', 'using': {'task': 'text-classification', 'model_name': 'SkolkovoInstitute/roberta_toxicity_classifier', 'input_column': 'text'}} | [NULL] |
21+
```
22+
23+
Testing SELECT (making predictions)
24+
25+
```sql
26+
SELECT *
27+
FROM mindsdb.hf_toxicity
28+
WHERE text = 'I like you. I love you.';
29+
```
30+
31+
Output:
32+
33+
```sql
34+
| PRED | PRED_explain | text |
35+
| ---- | ------------ | ---- |
36+
| neutral | {"neutral":0.9999547004699707,"toxic":0.000045352782763075083} | I like you. I love you. |
37+
```

mindsdb/api/http/namespaces/analysis.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import time
22

3+
import pandas as pd
34
from flask import request
45
from flask_restx import Resource
56
from pandas.core.frame import DataFrame
@@ -17,6 +18,17 @@
1718
def analyze_df(df: DataFrame) -> dict:
1819
if len(df) == 0:
1920
return {}
21+
22+
cols = pd.Series(df.columns)
23+
24+
# https://stackoverflow.com/questions/24685012/pandas-dataframe-renaming-multiple-identically-named-columns
25+
for dup in cols[cols.duplicated()].unique():
26+
cols[cols[cols == dup].index.values.tolist()] = [dup + '.' + str(i) if i != 0 else dup for i in
27+
range(sum(cols == dup))]
28+
29+
# rename the columns with the cols list.
30+
df.columns = cols
31+
2032
analysis = analyze_dataset(df)
2133
return analysis.to_dict()
2234

mindsdb/api/mysql/mysql_proxy/classes/sql_query.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,12 @@ def get_table_alias(table_obj, default_db_name):
106106
else:
107107
name = table_obj.alias.parts[0]
108108
name = (default_db_name, name)
109+
elif isinstance(table_obj, Join):
110+
# get from first table
111+
return get_table_alias(table_obj.left, default_db_name)
112+
else:
113+
# unknown yet object
114+
return default_db_name, 't', 't'
109115

110116
if table_obj.alias is not None:
111117
name = name + ('.'.join(table_obj.alias.parts),)

mindsdb/integrations/handlers/gmail_handler/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,14 @@ To see how the Gmail handler is used, let's walk through the steps to create a s
99

1010
## Connect to the Gmail API
1111

12-
To use the Gmail API we need to setup a Google Cloud Project and a Google Account with Gmail enabled.
12+
To use the Gmail API we need to set up a Google Cloud Project and a Google Account with Gmail enabled.
1313

1414
Before proceeding further, we will need to enable the Gmail API from the Google Cloud Console.
1515

16-
We will also need to create OAuth Client Ids for authenticating users, and possibly an Auth Consent Screen (if this is the first time we're setting up OAuth)
16+
We will also need to create OAuth Client Ids for authenticating users, and possibly an Auth Consent Screen (if this is the first time we're setting up OAuth).
1717

1818
Setting up OAuth Client Id will give us a credentials file which we will need in our mindsdb setup. You can find more information on how to do
19-
this [here](https://developers.google.com/gmail/quickstart/python).
19+
this [here](https://developers.google.com/gmail/api/quickstart/python).
2020

2121
**Optional:** The credentials file can be stored in the gmail_handler folder in
2222
the `mindsdb/integrations/handlers/gmail_handler` directory.
@@ -33,7 +33,7 @@ parameters = {
3333
This creates a database called mindsdb_gmail. This database ships with a table called emails that we can use to search for
3434
emails as well as to write emails.
3535

36-
You can also create a database by giving the credentials file from a s3 pre signed url.To do this you need to pass in the credentials_file parameter as a signed url.For example:
36+
You can also create a database by giving the credentials file from a s3 pre signed url. To do this you need to pass in the credentials_file parameter as a signed url. For example:
3737

3838
~~~~sql
3939
CREATE DATABASE mindsdb_gmail
@@ -59,7 +59,7 @@ LIMIT 20;
5959
~~~~
6060
This will search your Gmail inbox for any email which contains the text `alert` and is from `google.com` domain (notice the use of the wildcard `*`).
6161

62-
The returned result should have ROWs like this
62+
The returned result should have ROWs like this,
6363

6464
| id | message_id | thread_id | label_ids | sender | to | date | subject | snippet | history_id | size_estimate | body | attachments |
6565
| ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
@@ -68,7 +68,7 @@ The returned result should have ROWs like this
6868
where
6969
* query - The search term. The query parameter supports all the search terms we can use with gmail. For more details please check [this link](https://support.google.com/mail/answer/7190)
7070
* label_ids - A comma separated string of labels to search for. E.g. "INBOX,UNREAD" will search for unread emails in inbox, "SENT" will search for emails in the sent folder.
71-
* include_spam_trash - BOOLEAN (TRUE / FALSE). By default it is FALSE. If included, the search will cover the SPAM and TRASH folders.
71+
* include_spam_trash - BOOLEAN (TRUE / FALSE). By default, it is FALSE. If included, the search will cover the SPAM and TRASH folders.
7272

7373
## Writing Emails
7474

@@ -112,7 +112,7 @@ USING
112112
input_column = 'text_spammy',
113113
labels = ['ham', 'spam'];
114114
~~~~
115-
* Then you can have to create a view of the email table that contains the snippet or the body of the email.For example by using the snippet:
115+
* Then you can have to create a view of the email table that contains the snippet or the body of the email. For example by using the snippet:
116116
~~~~sql
117117
CREATE VIEW mindsdb.emails_text AS(
118118
SELECT snippet AS text_spammy

mindsdb/integrations/handlers/gmail_handler/gmail_handler.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from mindsdb_sql.parser import ast
1313
from mindsdb.utilities import log
1414
from mindsdb_sql import parse_sql
15+
from mindsdb.interfaces.storage.model_fs import HandlerStorage
1516

1617
import os
1718
import time
@@ -215,6 +216,8 @@ def __init__(self, name=None, **kwargs):
215216
self.service = None
216217
self.is_connected = False
217218

219+
self.storage = HandlerStorage(kwargs['integration_id'])
220+
218221
emails = EmailsTable(self)
219222
self.emails = emails
220223
self._register_table('emails', emails)
@@ -239,7 +242,7 @@ def create_connection(self) -> object:
239242
creds = None
240243

241244
# Get the current dir, we'll check for Token & Creds files in this dir
242-
curr_dir = os.path.dirname(__file__)
245+
curr_dir = self.storage.folder_get('config')
243246

244247
token_file = os.path.join(curr_dir, 'token.json')
245248
creds_file = os.path.join(curr_dir, 'creds.json')
@@ -260,6 +263,7 @@ def create_connection(self) -> object:
260263
with open(token_file, 'w') as token:
261264
token.write(creds.to_json())
262265

266+
self.storage.folder_sync('config')
263267
return build('gmail', 'v1', credentials=creds)
264268

265269
def connect(self) -> object:

requirements/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ sentry-sdk
1919
walrus==0.8.2
2020
flask-compress >= 1.0.0
2121
appdirs >= 1.0.0
22-
mindsdb-sql >= 0.6.4, < 0.7.0
22+
mindsdb-sql >= 0.6.5, < 0.7.0
2323
mindsdb-evaluator >= 0.0.7, < 0.1.0
2424
checksumdir >= 1.2.0
2525
duckdb == 0.6.0

0 commit comments

Comments
 (0)