Giskard Scan Results

5 issues detected

Robustness 1

Performance 4

Your model seems to be sensitive to small perturbations in the input data. These perturbations can include adding typos, changing word order, or turning text into uppercase or lowercase. This happens when:

There is not enough diversity in the training data
Overreliance on spurious correlations like the presence of specific word
Use of complex models with large number of parameters that tend to overfit the training data

To learn more about causes and solutions, check our guide on robustness issues.

Issues

1 major

Feature `text`

Add typos

Fail rate = 0.130

104/800 tested samples (13.0%) changed prediction after perturbation

800 samples affected
(91.7% of dataset)

Show details

Description

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 13.0% of the cases. We expected the predictions not to be affected by this transformation.

Examples

	text	Add typos(text)	Original prediction	Prediction after perturbation
13	we root for ( clara and paul ) , even like them , though perhaps it 's an emotion closer to pity .	we root for ( clara and paul ) , even like them , htough perhaps it 's an emotiom closer to pity .	POSITIVE (p = 0.96)	NEGATIVE (p = 0.99)
16	the emotions are raw and will strike a nerve with anyone who 's ever had family trauma .	the ekotions are raw andw ill strike a nerve with anyone wgo 's ever had family trauma .	POSITIVE (p = 1.00)	NEGATIVE (p = 0.60)
22	holden caulfield did it better .	holdsn caulfkeld did t better .	POSITIVE (p = 0.99)	NEGATIVE (p = 1.00)

Taxonomy

avid-effect:performance:P0201

We found some data slices in your dataset on which your model performance is lower than average. Performance bias may happen for different reasons:

Not enough examples in the low-performing data slice in the training set
Wrong labels in the training set in the low-performing data slice
Drift between your training set and test set

To learn more about causes and solutions, check our guide on performance bias.

Issues

2 major 2 medium

`text_length(text)` >= 50.500 AND `text_length(text)` < 61.500 Precision = 0.759 (Global = 0.898) -15.50% than global 60 samples affected
(6.9% of dataset) Show details Hide details

Description

For records in the dataset where `text_length(text)` >= 50.500 AND `text_length(text)` < 61.500, the Precision is 15.5% lower than the global Precision.

Examples

	text	text_length(text)	label	Predicted `label`
92	you wo n't like roger , but you will quickly recognize him .	61	NEGATIVE	POSITIVE (p = 1.00)
171	rarely has leukemia looked so shimmering and benign .	54	NEGATIVE	POSITIVE (p = 0.98)
183	the lower your expectations , the more you 'll enjoy it .	58	NEGATIVE	POSITIVE (p = 1.00)

Taxonomy

avid-effect:performance:P0204

`text_length(text)` >= 73.500 AND `text_length(text)` < 82.500 Recall = 0.826 (Global = 0.930) -11.19% than global 45 samples affected
(5.2% of dataset) Show details Hide details

Description

For records in the dataset where `text_length(text)` >= 73.500 AND `text_length(text)` < 82.500, the Recall is 11.19% lower than the global Recall.

Examples

	text	text_length(text)	label	Predicted `label`
93	if steven soderbergh 's ` solaris ' is a failure it is a glorious failure .	76	POSITIVE	NEGATIVE (p = 1.00)
123	turns potentially forgettable formula into something strangely diverting .	75	POSITIVE	NEGATIVE (p = 0.99)
142	what better message than ` love thyself ' could young women of any size receive ?	82	POSITIVE	NEGATIVE (p = 0.99)

Taxonomy

avid-effect:performance:P0204

`text_length(text)` >= 165.500 AND `text_length(text)` < 179.500 Recall = 0.871 (Global = 0.930) -6.37% than global 49 samples affected
(5.6% of dataset) Show details Hide details

Description

For records in the dataset where `text_length(text)` >= 165.500 AND `text_length(text)` < 179.500, the Recall is 6.37% lower than the global Recall.

Examples

	text	text_length(text)	label	Predicted `label`
158	by getting myself wrapped up in the visuals and eccentricities of many of the characters , i found myself confused when it came time to get to the heart of the movie .	168	NEGATIVE	POSITIVE (p = 0.99)
266	a coda in every sense , the pinochet case splits time between a minute-by-minute account of the british court 's extradition chess game and the regime 's talking-head survivors .	179	POSITIVE	NEGATIVE (p = 0.99)
282	while there 's something intrinsically funny about sir anthony hopkins saying ` get in the car , bitch , ' this jerry bruckheimer production has little else to offer	166	POSITIVE	NEGATIVE (p = 1.00)

Taxonomy

avid-effect:performance:P0204

`text_length(text)` >= 151.500 AND `text_length(text)` < 165.500 Recall = 0.875 (Global = 0.930) -5.93% than global 59 samples affected
(6.8% of dataset) Show details Hide details

Description

For records in the dataset where `text_length(text)` >= 151.500 AND `text_length(text)` < 165.500, the Recall is 5.93% lower than the global Recall.

Examples

	text	text_length(text)	label	Predicted `label`
324	you 'll gasp appalled and laugh outraged and possibly , watching the spectacle of a promising young lad treading desperately in a nasty sea , shed an errant tear .	164	POSITIVE	NEGATIVE (p = 0.95)
673	drops you into a dizzying , volatile , pressure-cooker of a situation that quickly snowballs out of control , while focusing on the what much more than the why .	162	POSITIVE	NEGATIVE (p = 0.94)
692	sustains its dreamlike glide through a succession of cheesy coincidences and voluptuous cheap effects , not the least of which is rebecca romijn-stamos .	154	NEGATIVE	POSITIVE (p = 0.94)

Taxonomy

avid-effect:performance:P0204

What's next?

1. Generate a test suite from your scan results

test_suite = results.generate_test_suite("My first test suite")

2. Run your test suite

test_suite.run()