vdas_20120230
by
Marie Anne Bizouard
—
last modified
2012-03-01 21:06
VDB special call
Agenda:
1. VDB status progress
2. Questions to be answered
Attendance: Gary, Elena, Didier, MAB
Gary's document
Gary:
Test machine set up @ Cascina to test VDB developments. Gary is proposing to reduce drastically the number of tables.
3 fundamental tables:
- DQ flag (name, ...)
- Version (version, upload dates)
- Segment
Registry (contain all table names): to avoid duplication of information such as flag name each time a new version is uploaded.
Simplification of VDB tables
Relation with segdb: not clear .... there are segdb table related things. Need to discuss
nb of versions: ~10
nb of flags : ~100
nb of segments: between ~10 up to several 10^6
dq flag version: integer instead of text?
discussion about online/offline/scimon segments: should the version number contain this information?
so far version 1 reserved to online. Any offline lists come with version >=2. In segdb there is not such a distinction (to be checked with LIGO folks). We need to clarify this at Boston and adopt a common definition.
Agreement about the fact that version should be an integer and used only for ... a version number. Other information such that online/offline or scimon or ... should be stored in another "string" field. Elena is proposing to store information such as "glitch family". It could be easily added later.
discussion SCIMON lists: problem with overlap. Not totally clear
Test of the new VDB server in the next weeks.
Working plan:
1/ first tests. Use VSR1/2/3/4 lists for upload
2/ work on interface
3/ more tests: size of the segments. We had pbs in the past with >10^6 segments lists. Online vs offline.
Next meeting: maybe next thursday, otherwise: VDQ call or VDAS call but after Boston
Emails exchange:
QUESTIONS FOR THE MEETING [Gary]:
--------------------------
1. Data quality flags - Is -1 actually necessary? Surely it is a de facto
default state?
2. In relation to the association of a DQ flag version to a run, do you
define this as an argument passed to the database when uploading a new
list?
3. Should the DQ flag 'Recommended Category', as mentioned in the document
cited by Marie Anne earlier, be added?
4. In relation to the DQ flag versions table, an 'undone by' field exists.
Is this really necessary, given that, unless specifically requested, only
the latest version of a DQ flag is returned?
5. Is it necessary to know which host a user used to upload a new version
of a DQ flag? The same question stands for the start and stop times of the
transactions.
6. Scimon entries refer to specific time segments or to DQ flag versions?
7. In relation to Scimon entries, am I correct in stating that up until
now the whole process has been handled by the Logbook or was part of the
information handled in the Logbook and part of it entered at the level of
the VDB web interface?
8. When uploading lists via the toolkit, appoximately how big (in terms of
bytes and lines of text) are the files?
9. Do we have any idea as to how the new version of SegDB will be structured?
10. Do any important fields appear to be missing from the proposed new
database structure?
Didier's answers:
Hi Gary,
Thanks for this document which will be a precious thing for the further discussions.
By the way, I may have forgotten or missed something: when are we supposed to
present VDb in Boston meeting? In which session?
Now, a few answers before we meet:
6. Scimon entries refer to specific time segments or to DQ flag versions?
--> Each Scimon entry correspond to a time segment of a DQ flag whose name
is unique (SCI_something) and version is unique ("sci"). On the same model as the online
DQ segments which are sent by SegOnline to VDB (name is "V1:something"
and version is "online" (or "v1" in the future if we decide this change))
7. In relation to Scimon entries, am I correct in stating that up until
now the whole process has been handled by the Logbook or was part of the
information handled in the Logbook and part of it entered at the level of
the VDB web interface?
--> All the Scimon entries are done using the dedicated VDB web interface.
Some of those entries were made by people in shift, others were made by VDQ
people after they have scanned the logbook.
8. When uploading lists via the toolkit, appoximately how big (in terms of
bytes and lines of text) are the files?
--> It can vary a lot. From 100 lines and 1 kBytes up to 800.000 lines and several MBytes.
9. Do we have any idea as to how the new version of SegDB will be structured?
--> I did not follow the development of segdb2. So, I do not know.
10. Do any important fields appear to be missing from the proposed new
database structure?
--> I will try to read your document before we meet! 8-)
Didier
Second Didier's email:
Hi Gary,
After reading your document, I have a few comments before we meet at VDAS
(because I feel more comfortable with writing than speaking and because
"les paroles s'envolent et les ecrits restent"):
p.4, In 1.2
Following the changes in VDB structure, modifications will be needed
not only in VDB web interface and VDBtk_segment but also in the
VDB library (used by SegOnline to feed VDB online) and in the
SciMon DQ web interface.
p.7
SEGDBMAP may not be so obsolete. I suspect it contained the version of the last
segments lists uploaded in segdb and was updated by querying a php code of LIGO.
To be checked with LIGO people.
p.10
I do not know what is dq_flag_uri. I suppose that you wanted to write dq_flag_url,
like what is written in figure 2
p.11
In tbl_dq_flag_versions, the field dq_flag_version is an INT.
I would prefer that it is a small VARCHAR so that we can continue to have
versions "v2" or "sci" or "online". The other solution would be to keep an INT
and to replace "online" by 1, "sci" by -1 and any fixed version like that by a negative value.
p.11
In tbl_dq_flag_versions, the field dq_flag_version_data_taking just needs a VARCHAR to contain
"VSR2", "VSR3", etc... I do not see the need to point to a value table.
p.11
In tbl_dq_flag_versions, the field dq_flag_itf can be removed because it is redundant with
the field dq_flag_itf of the table tbl_dq_flag
p.12
In tbl_users, I am wondering how the MySQL stored procedure will work when SegOnline
will feed VDb with online segments.
p.12, In 2.2.1.6
You say you need tbl_values because it avoids the necessity to constantly write heavy text strings
to every row. But such table is used only by tbl_dq_flag and tbl_dq_flag_versions which will have not
a lot of rows (only the table tbl_segments will have potentially millions of rows). tbl_dq_flag
and tbl_dq_flag_versions will contain only a few hundreds or a few thousands of rows.
So, for now, I do not see any need for tbl_values. The same for tbl_values_groups
p.13
For SciMon DQ flags, I would add a pointer in tbl_segments which points to a new table
"tbl_descript" which contains a field "description" which describe the event seen by the shifter
and a field "insert_time" which contains the date of the insertion of this SciMon DQ segment
by the shifter.
p.13
I would add also in tbl_dq_flag_versions the fields "dq_flag_freqmin" and "dq_flag_freqmax"
which are the edges of the frequency band where this flag applies.
Hear you soon,
Didier
1. VDB status progress
2. Questions to be answered
Attendance: Gary, Elena, Didier, MAB
Gary's document
Gary:
Test machine set up @ Cascina to test VDB developments. Gary is proposing to reduce drastically the number of tables.
3 fundamental tables:
- DQ flag (name, ...)
- Version (version, upload dates)
- Segment
Registry (contain all table names): to avoid duplication of information such as flag name each time a new version is uploaded.
Simplification of VDB tables
Relation with segdb: not clear .... there are segdb table related things. Need to discuss
nb of versions: ~10
nb of flags : ~100
nb of segments: between ~10 up to several 10^6
dq flag version: integer instead of text?
discussion about online/offline/scimon segments: should the version number contain this information?
so far version 1 reserved to online. Any offline lists come with version >=2. In segdb there is not such a distinction (to be checked with LIGO folks). We need to clarify this at Boston and adopt a common definition.
Agreement about the fact that version should be an integer and used only for ... a version number. Other information such that online/offline or scimon or ... should be stored in another "string" field. Elena is proposing to store information such as "glitch family". It could be easily added later.
discussion SCIMON lists: problem with overlap. Not totally clear
Test of the new VDB server in the next weeks.
Working plan:
1/ first tests. Use VSR1/2/3/4 lists for upload
2/ work on interface
3/ more tests: size of the segments. We had pbs in the past with >10^6 segments lists. Online vs offline.
Next meeting: maybe next thursday, otherwise: VDQ call or VDAS call but after Boston
Emails exchange:
QUESTIONS FOR THE MEETING [Gary]:
--------------------------
1. Data quality flags - Is -1 actually necessary? Surely it is a de facto
default state?
2. In relation to the association of a DQ flag version to a run, do you
define this as an argument passed to the database when uploading a new
list?
3. Should the DQ flag 'Recommended Category', as mentioned in the document
cited by Marie Anne earlier, be added?
4. In relation to the DQ flag versions table, an 'undone by' field exists.
Is this really necessary, given that, unless specifically requested, only
the latest version of a DQ flag is returned?
5. Is it necessary to know which host a user used to upload a new version
of a DQ flag? The same question stands for the start and stop times of the
transactions.
6. Scimon entries refer to specific time segments or to DQ flag versions?
7. In relation to Scimon entries, am I correct in stating that up until
now the whole process has been handled by the Logbook or was part of the
information handled in the Logbook and part of it entered at the level of
the VDB web interface?
8. When uploading lists via the toolkit, appoximately how big (in terms of
bytes and lines of text) are the files?
9. Do we have any idea as to how the new version of SegDB will be structured?
10. Do any important fields appear to be missing from the proposed new
database structure?
Didier's answers:
Hi Gary,
Thanks for this document which will be a precious thing for the further discussions.
By the way, I may have forgotten or missed something: when are we supposed to
present VDb in Boston meeting? In which session?
Now, a few answers before we meet:
6. Scimon entries refer to specific time segments or to DQ flag versions?
--> Each Scimon entry correspond to a time segment of a DQ flag whose name
is unique (SCI_something) and version is unique ("sci"). On the same model as the online
DQ segments which are sent by SegOnline to VDB (name is "V1:something"
and version is "online" (or "v1" in the future if we decide this change))
7. In relation to Scimon entries, am I correct in stating that up until
now the whole process has been handled by the Logbook or was part of the
information handled in the Logbook and part of it entered at the level of
the VDB web interface?
--> All the Scimon entries are done using the dedicated VDB web interface.
Some of those entries were made by people in shift, others were made by VDQ
people after they have scanned the logbook.
8. When uploading lists via the toolkit, appoximately how big (in terms of
bytes and lines of text) are the files?
--> It can vary a lot. From 100 lines and 1 kBytes up to 800.000 lines and several MBytes.
9. Do we have any idea as to how the new version of SegDB will be structured?
--> I did not follow the development of segdb2. So, I do not know.
10. Do any important fields appear to be missing from the proposed new
database structure?
--> I will try to read your document before we meet! 8-)
Didier
Second Didier's email:
Hi Gary,
After reading your document, I have a few comments before we meet at VDAS
(because I feel more comfortable with writing than speaking and because
"les paroles s'envolent et les ecrits restent"):
p.4, In 1.2
Following the changes in VDB structure, modifications will be needed
not only in VDB web interface and VDBtk_segment but also in the
VDB library (used by SegOnline to feed VDB online) and in the
SciMon DQ web interface.
p.7
SEGDBMAP may not be so obsolete. I suspect it contained the version of the last
segments lists uploaded in segdb and was updated by querying a php code of LIGO.
To be checked with LIGO people.
p.10
I do not know what is dq_flag_uri. I suppose that you wanted to write dq_flag_url,
like what is written in figure 2
p.11
In tbl_dq_flag_versions, the field dq_flag_version is an INT.
I would prefer that it is a small VARCHAR so that we can continue to have
versions "v2" or "sci" or "online". The other solution would be to keep an INT
and to replace "online" by 1, "sci" by -1 and any fixed version like that by a negative value.
p.11
In tbl_dq_flag_versions, the field dq_flag_version_data_taking just needs a VARCHAR to contain
"VSR2", "VSR3", etc... I do not see the need to point to a value table.
p.11
In tbl_dq_flag_versions, the field dq_flag_itf can be removed because it is redundant with
the field dq_flag_itf of the table tbl_dq_flag
p.12
In tbl_users, I am wondering how the MySQL stored procedure will work when SegOnline
will feed VDb with online segments.
p.12, In 2.2.1.6
You say you need tbl_values because it avoids the necessity to constantly write heavy text strings
to every row. But such table is used only by tbl_dq_flag and tbl_dq_flag_versions which will have not
a lot of rows (only the table tbl_segments will have potentially millions of rows). tbl_dq_flag
and tbl_dq_flag_versions will contain only a few hundreds or a few thousands of rows.
So, for now, I do not see any need for tbl_values. The same for tbl_values_groups
p.13
For SciMon DQ flags, I would add a pointer in tbl_segments which points to a new table
"tbl_descript" which contains a field "description" which describe the event seen by the shifter
and a field "insert_time" which contains the date of the insertion of this SciMon DQ segment
by the shifter.
p.13
I would add also in tbl_dq_flag_versions the fields "dq_flag_freqmin" and "dq_flag_freqmax"
which are the edges of the frequency band where this flag applies.
Hear you soon,
Didier