Beverage Preference Data set

March 25, 2016
Tohru Iwasaki & Tetsuo Furukawa
Department of Human Intelligence Systems
Kyushu Institute of Technology
This database contains a survey data of beverage preference from 604 respondents.
The data represent the degrees of frequency drinking 14 beverages under 11 different situations. Thus the entire dataset is represented by a 3-dimensional array, i.e., a tensor of order 3.

We collected this dataset for our research on nonlinear tensor decomposition in machine learning field. We hope this dataset will be used as a benchmark by the people who are interested in tensor data analysis, relational data analysis, recommendation system, etc.

This database also contains the preference data of food and leisure collected from the same respondents. Thus this database can be also used for the research on multi-view learning.

You can use this database under the following conditions.
This data is appeared first in the following paper.
T. Iwasaki and T. Furukawa
Neural Networks, Vol.77, pp.107-125, 2016.
doi:10.1016/j.neunet.2016.01.013

File contents

Download

The archive file contains the following three files. This dataset is collected by the questionnaire survey from 604 Japanese respondents. Lazy respondents who responded same scores for all questions were removed in advance, and they are not included in this dataset. There is no missing data in this database.

The respondents were asked to answer the frequency of drinking 14 beverages under 11 situations. Thus each respondent is requested to rate 14 x 11 scores, and the entire dataset is represented by an array of (604 respondents) x (14 beverages) x (11 situations). The chosen beverages are commonly sold at supermarkets in Japan, which are usually in plastic bottles.

Data Format of Beverage Preference Data (Beverage604.txt)

The data file "Beverage604.txt" consists of 604 blocks corresponding to the 604 respondents. Each block represents a 14 x 11 matrix, and the numbers are the ratings. 5 means that the respondent drinks the beverage frequently under the situation, and 1 means he/she drinks it rarely. Every row corresponds to a beverage, and columns are situations.

For example, the first block is as follows.
2 5 1 1 4 5 1 3 2 3 5	# Coke
2 3 2 2 5 4 1 1 2 3 2	# Soda pop (Seven up, etc.)
1 2 1 1 3 3 2 4 2 5 4	# Ginger ale
2 3 1 2 3 3 3 1 2 2 2	# Melon soda
5 3 3 5 4 3 5 2 1 2 1	# Orange juice
3 1 3 5 4 3 3 2 1 2 1	# Apple juice
5 4 4 4 3 1 5 3 5 4 2	# Vegetable juice
4 1 5 2 1 1 4 3 3 1 3	# Black tea (with sugar)
1 3 2 2 1 1 2 5 3 5 5	# Oolong tea (Chinese tea)
5 4 5 4 1 3 4 5 5 1 5	# Green tea (Japanese tea)
4 1 5 5 2 2 5 4 4 1 4	# Cafe au lait (Coffee with milk)
1 3 3 3 2 3 3 1 1 2 1	# Lactic drink
3 5 4 3 5 5 3 5 5 5 4	# Mineral water
3 5 2 1 5 5 1 2 3 2 2	# Isotonic drink
This first block represents the score of the first respondent. Such matrices are repeated 604 times separated by a null line.

The 11 situations are in the following order.
Column 1: Deskwork or studying
Column 2: Outdoor work
Column 3: Brake time or teatime
Column 4: Indoor leisure (watching video, etc.)
Column 5: Sports/exercise time
Column 6: Outdoor leisure
Column 7: In the car/train
Column 8: Lunch time
Column 9: Awakening time
Column 10: Bedtime 
Column 11: Party time

Data Format of Side Information Data (Beverage604-side.txt)

"Beverage604-side.txt" consists of 604 lines corresponding to the 604 respondents. The side information consists of 19 items. For example, lines 1-5 of the data file correspond to the data of the respondent 1 to 5, which is as follows.
1 80 2 5 5 5 5 3 5 5 3 3 1 4 1 3 1 1 5	#Respondent 1
1 55 2 5 4 4 4 4 5 5 5 5 3 5 2 4 4 5 4	#Respondent 2
2 57 2 5 4 4 4 3 4 4 4 4 4 1 1 1 3 3 4	#Respondent 3
2 48 2 5 5 5 5 2 3 5 5 4 3 5 1 2 1 3 3	#Respondent 4
1 59 1 4 1 1 1 1 4 4 4 4 2 5 5 4 1 1 5	#Respondent 5

Contact address

If you have a question, please contact to

Tetsuo Furukawa
furukawa@brain.kyutech.ac.jp

Department of Human Intelligence Systems
Kyushu Institute of Technology