Logs clustering 

We need a procedure to group together failures with the almost same error messages... 

Typically our use case is to investigate whether failures that are supposed to be caused by one option have the same error messages. 

Let's take an example.
We can get all configuration failures in which `AIC7XXX_BUILD_FIRMWARE` is activated
`aic7xx_failures = rawtuxdata.query("AIC7XXX_BUILD_FIRMWARE == 'y' & vmlinux == -1")`
using TUXML-analysis

then we get access to `cid` and retrieve all logs:
```
   logs_ aic7xx_failures = [err_logs_configuration(cid) for cid in aic7xx_failures['cid']]
```
using facilities of `bdd-tuxml-facility`

What remains open is how to cluster logs. 
We can start with the findings of common lines (if any), with pre-defined "error patterns" we know, or more advanced clustering https://scikit-learn.org/0.18/auto_examples/text/document_clustering.html 

Once we have clusters, we have to explain them:
 * is AIC7XXX_BUILD_FIRMWARE really the cause? 
  * in that case, why and to what extend error messages differ?
 * is it due to another (combination of) option? is there a masking effect? 

We can refine the query as well: 
`rawtuxdata.query("AIC7XXX_BUILD_FIRMWARE == 'y' & AIC79XX_BUILD_FIRMWARE == 'y' & vmlinux == -1")` 
to investigate the masking-effect between individual options leading to failures 

Another idea would be to consider all failures
`rawtuxdata.query("vmlinux == -1")` 
and analyze all logs... But too many clusters may popup. We can certainly try, but I suggest to try with individual options first 








Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logs clustering #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Logs clustering #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions