定义云机器学习预测的实例键(索引号)

2023-12-02

我遵循了'入门' 云机器学习引擎教程并进行部署。我可以将包含 JSON 实例的输入文件传递给批量预测服务,它会返回包含预测的文件。如何通过应用程序图不改变地传递实例键(索引号),以便预测包含该键,并且我知道哪个 JSON 预测属于哪个 JSON 输入?它可能可以通过添加/更改中的几行来完成原始教程代码(也在下面复制粘贴)。有人可以帮我吗?我对 Tensorflow 比较陌生,因此非常感谢详细的描述。示例代码或教程也会非常有帮助...“入门”示例代码包含复制粘贴在下面的两个文件:

model.py

# Copyright 2016 Google Inc. All Rights Reserved. Licensed under the Apache
# License, Version 2.0 (the "License"); you may not use this file except in
# compliance with the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations under
# the License.

"""Define a Wide + Deep model for classification on structured data."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import multiprocessing

import six
import tensorflow as tf


# Define the format of your input data including unused columns
CSV_COLUMNS = ['age', 'workclass', 'fnlwgt', 'education', 'education_num',
               'marital_status', 'occupation', 'relationship', 'race', 'gender',
               'capital_gain', 'capital_loss', 'hours_per_week',
               'native_country', 'income_bracket']
CSV_COLUMN_DEFAULTS = [[0], [''], [0], [''], [0], [''], [''], [''], [''], [''],
                       [0], [0], [0], [''], ['']]
LABEL_COLUMN = 'income_bracket'
LABELS = [' <=50K', ' >50K']

# Define the initial ingestion of each feature used by your model.
# Additionally, provide metadata about the feature.
INPUT_COLUMNS = [
    # Categorical base columns

    # For categorical columns with known values we can provide lists
    # of values ahead of time.
    tf.feature_column.categorical_column_with_vocabulary_list(
        'gender', [' Female', ' Male']),

    tf.feature_column.categorical_column_with_vocabulary_list(
        'race',
        [' Amer-Indian-Eskimo', ' Asian-Pac-Islander',
         ' Black', ' Other', ' White']
    ),
    tf.feature_column.categorical_column_with_vocabulary_list(
        'education',
        [' Bachelors', ' HS-grad', ' 11th', ' Masters', ' 9th',
         ' Some-college', ' Assoc-acdm', ' Assoc-voc', ' 7th-8th',
         ' Doctorate', ' Prof-school', ' 5th-6th', ' 10th',
         ' 1st-4th', ' Preschool', ' 12th']),
    tf.feature_column.categorical_column_with_vocabulary_list(
        'marital_status',
        [' Married-civ-spouse', ' Divorced', ' Married-spouse-absent',
         ' Never-married', ' Separated', ' Married-AF-spouse', ' Widowed']),
    tf.feature_column.categorical_column_with_vocabulary_list(
        'relationship',
        [' Husband', ' Not-in-family', ' Wife', ' Own-child', ' Unmarried',
         ' Other-relative']),
    tf.feature_column.categorical_column_with_vocabulary_list(
        'workclass',
        [' Self-emp-not-inc', ' Private', ' State-gov',
         ' Federal-gov', ' Local-gov', ' ?', ' Self-emp-inc',
         ' Without-pay', ' Never-worked']
    ),

    # For columns with a large number of values, or unknown values
    # We can use a hash function to convert to categories.
    tf.feature_column.categorical_column_with_hash_bucket(
        'occupation', hash_bucket_size=100, dtype=tf.string),
    tf.feature_column.categorical_column_with_hash_bucket(
        'native_country', hash_bucket_size=100, dtype=tf.string),

    # Continuous base columns.
    tf.feature_column.numeric_column('age'),
    tf.feature_column.numeric_column('education_num'),
    tf.feature_column.numeric_column('capital_gain'),
    tf.feature_column.numeric_column('capital_loss'),
    tf.feature_column.numeric_column('hours_per_week'),
]

UNUSED_COLUMNS = set(CSV_COLUMNS) - {col.name for col in INPUT_COLUMNS} - \
    {LABEL_COLUMN}


def build_estimator(config, embedding_size=8, hidden_units=None):
  """Build a wide and deep model for predicting income category.

  Wide and deep models use deep neural nets to learn high level abstractions
  about complex features or interactions between such features.
  These models then combined the outputs from the DNN with a linear regression
  performed on simpler features. This provides a balance between power and
  speed that is effective on many structured data problems.

  You can read more about wide and deep models here:
  https://research.googleblog.com/2016/06/wide-deep-learning-better-together-with.html

  To define model we can use the prebuilt DNNCombinedLinearClassifier class,
  and need only define the data transformations particular to our dataset, and
  then
  assign these (potentially) transformed features to either the DNN, or linear
  regression portion of the model.

  Args:
    config: tf.contrib.learn.RunConfig defining the runtime environment for the
      estimator (including model_dir).
    embedding_size: int, the number of dimensions used to represent categorical
      features when providing them as inputs to the DNN.
    hidden_units: [int], the layer sizes of the DNN (input layer first)
    learning_rate: float, the learning rate for the optimizer.
  Returns:
    A DNNCombinedLinearClassifier
  """
  (gender, race, education, marital_status, relationship,
   workclass, occupation, native_country, age,
   education_num, capital_gain, capital_loss, hours_per_week) = INPUT_COLUMNS
  # Build an estimator.

  # Reused Transformations.
  # Continuous columns can be converted to categorical via bucketization
  age_buckets = tf.feature_column.bucketized_column(
      age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])

  # Wide columns and deep columns.
  wide_columns = [
      # Interactions between different categorical features can also
      # be added as new virtual features.
      tf.feature_column.crossed_column(
          ['education', 'occupation'], hash_bucket_size=int(1e4)),
      tf.feature_column.crossed_column(
          [age_buckets, race, 'occupation'], hash_bucket_size=int(1e6)),
      tf.feature_column.crossed_column(
          ['native_country', 'occupation'], hash_bucket_size=int(1e4)),
      gender,
      native_country,
      education,
      occupation,
      workclass,
      marital_status,
      relationship,
      age_buckets,
  ]

  deep_columns = [
      # Use indicator columns for low dimensional vocabularies
      tf.feature_column.indicator_column(workclass),
      tf.feature_column.indicator_column(education),
      tf.feature_column.indicator_column(marital_status),
      tf.feature_column.indicator_column(gender),
      tf.feature_column.indicator_column(relationship),
      tf.feature_column.indicator_column(race),

      # Use embedding columns for high dimensional vocabularies
      tf.feature_column.embedding_column(
          native_country, dimension=embedding_size),
      tf.feature_column.embedding_column(occupation, dimension=embedding_size),
      age,
      education_num,
      capital_gain,
      capital_loss,
      hours_per_week,
  ]

  return tf.estimator.DNNLinearCombinedClassifier(
      config=config,
      linear_feature_columns=wide_columns,
      dnn_feature_columns=deep_columns,
      dnn_hidden_units=hidden_units or [100, 70, 50, 25]
  )


def parse_label_column(label_string_tensor):
  """Parses a string tensor into the label tensor
  Args:
    label_string_tensor: Tensor of dtype string. Result of parsing the
    CSV column specified by LABEL_COLUMN
  Returns:
    A Tensor of the same shape as label_string_tensor, should return
    an int64 Tensor representing the label index for classification tasks,
    and a float32 Tensor representing the value for a regression task.
  """
  # Build a Hash Table inside the graph
  table = tf.contrib.lookup.index_table_from_tensor(tf.constant(LABELS))

  # Use the hash table to convert string labels to ints and one-hot encode
  return table.lookup(label_string_tensor)


# ************************************************************************
# YOU NEED NOT MODIFY ANYTHING BELOW HERE TO ADAPT THIS MODEL TO YOUR DATA
# ************************************************************************


def csv_serving_input_fn():
  """Build the serving inputs."""
  csv_row = tf.placeholder(
      shape=[None],
      dtype=tf.string
  )
  features = parse_csv(csv_row)
  features.pop(LABEL_COLUMN)
  return tf.estimator.export.ServingInputReceiver(features, {'csv_row': csv_row})


def example_serving_input_fn():
  """Build the serving inputs."""
  example_bytestring = tf.placeholder(
      shape=[None],
      dtype=tf.string,
  )
  feature_scalars = tf.parse_example(
      example_bytestring,
      tf.feature_column.make_parse_example_spec(INPUT_COLUMNS)
  )
  return tf.estimator.export.ServingInputReceiver(
      features,
      {'example_proto': example_bytestring}
  )

# [START serving-function]
def json_serving_input_fn():
  """Build the serving inputs."""
  inputs = {}
  for feat in INPUT_COLUMNS:
    inputs[feat.name] = tf.placeholder(shape=[None], dtype=feat.dtype)

  return tf.estimator.export.ServingInputReceiver(inputs, inputs)
# [END serving-function]

SERVING_FUNCTIONS = {
    'JSON': json_serving_input_fn,
    'EXAMPLE': example_serving_input_fn,
    'CSV': csv_serving_input_fn
}


def parse_csv(rows_string_tensor):
  """Takes the string input tensor and returns a dict of rank-2 tensors."""

  # Takes a rank-1 tensor and converts it into rank-2 tensor
  # Example if the data is ['csv,line,1', 'csv,line,2', ..] to
  # [['csv,line,1'], ['csv,line,2']] which after parsing will result in a
  # tuple of tensors: [['csv'], ['csv']], [['line'], ['line']], [[1], [2]]
  row_columns = tf.expand_dims(rows_string_tensor, -1)
  columns = tf.decode_csv(row_columns, record_defaults=CSV_COLUMN_DEFAULTS)
  features = dict(zip(CSV_COLUMNS, columns))

  # Remove unused columns
  for col in UNUSED_COLUMNS:
    features.pop(col)
  return features


def input_fn(filenames,
                      num_epochs=None,
                      shuffle=True,
                      skip_header_lines=0,
                      batch_size=200):
  """Generates features and labels for training or evaluation.
  This uses the input pipeline based approach using file name queue
  to read data so that entire data is not loaded in memory.

  Args:
      filenames: [str] list of CSV files to read data from.
      num_epochs: int how many times through to read the data.
        If None will loop through data indefinitely
      shuffle: bool, whether or not to randomize the order of data.
        Controls randomization of both file order and line order within
        files.
      skip_header_lines: int set to non-zero in order to skip header lines
        in CSV files.
      batch_size: int First dimension size of the Tensors returned by
        input_fn
  Returns:
      A (features, indices) tuple where features is a dictionary of
        Tensors, and indices is a single Tensor of label indices.
  """
  filename_dataset = tf.data.Dataset.from_tensor_slices(filenames)
  if shuffle:
    # Process the files in a random order.
    filename_dataset = filename_dataset.shuffle(len(filenames))

  # For each filename, parse it into one element per line, and skip the header
  # if necessary.
  dataset = filename_dataset.flat_map(
      lambda filename: tf.data.TextLineDataset(filename).skip(skip_header_lines))

  dataset = dataset.map(parse_csv)
  if shuffle:
    dataset = dataset.shuffle(buffer_size=batch_size * 10)
  dataset = dataset.repeat(num_epochs)
  dataset = dataset.batch(batch_size)
  iterator = dataset.make_one_shot_iterator()
  features = iterator.get_next()
  return features, parse_label_column(features.pop(LABEL_COLUMN))

task.py

import argparse
import os

import trainer.model as model

import tensorflow as tf
from tensorflow.contrib.learn.python.learn.utils import (
    saved_model_export_utils)
from tensorflow.contrib.training.python.training import hparam


def run_experiment(hparams):
  """Run the training and evaluate using the high level API"""

  train_input = lambda: model.input_fn(
      hparams.train_files,
      num_epochs=hparams.num_epochs,
      batch_size=hparams.train_batch_size
  )

  # Don't shuffle evaluation data
  eval_input = lambda: model.input_fn(
      hparams.eval_files,
      batch_size=hparams.eval_batch_size,
      shuffle=False
  )

  train_spec = tf.estimator.TrainSpec(train_input,
                                      max_steps=hparams.train_steps
                                      )

  exporter = tf.estimator.FinalExporter('census',
          model.SERVING_FUNCTIONS[hparams.export_format])
  eval_spec = tf.estimator.EvalSpec(eval_input,
                                    steps=hparams.eval_steps,
                                    exporters=[exporter],
                                    name='census-eval'
                                    )

  run_config = tf.estimator.RunConfig()
  run_config = run_config.replace(model_dir=hparams.job_dir)
  print('model dir {}'.format(run_config.model_dir))
  estimator = model.build_estimator(
      embedding_size=hparams.embedding_size,
      # Construct layers sizes with exponetial decay
      hidden_units=[
          max(2, int(hparams.first_layer_size *
                     hparams.scale_factor**i))
          for i in range(hparams.num_layers)
      ],
      config=run_config
  )

  tf.estimator.train_and_evaluate(estimator,
                                  train_spec,
                                  eval_spec)

if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  # Input Arguments
  parser.add_argument(
      '--train-files',
      help='GCS or local paths to training data',
      nargs='+',
      required=True
  )
  parser.add_argument(
      '--num-epochs',
      help="""\
      Maximum number of training data epochs on which to train.
      If both --max-steps and --num-epochs are specified,
      the training job will run for --max-steps or --num-epochs,
      whichever occurs first. If unspecified will run for --max-steps.\
      """,
      type=int,
  )
  parser.add_argument(
      '--train-batch-size',
      help='Batch size for training steps',
      type=int,
      default=40
  )
  parser.add_argument(
      '--eval-batch-size',
      help='Batch size for evaluation steps',
      type=int,
      default=40
  )
  parser.add_argument(
      '--eval-files',
      help='GCS or local paths to evaluation data',
      nargs='+',
      required=True
  )
  # Training arguments
  parser.add_argument(
      '--embedding-size',
      help='Number of embedding dimensions for categorical columns',
      default=8,
      type=int
  )
  parser.add_argument(
      '--first-layer-size',
      help='Number of nodes in the first layer of the DNN',
      default=100,
      type=int
  )
  parser.add_argument(
      '--num-layers',
      help='Number of layers in the DNN',
      default=4,
      type=int
  )
  parser.add_argument(
      '--scale-factor',
      help='How quickly should the size of the layers in the DNN decay',
      default=0.7,
      type=float
  )
  parser.add_argument(
      '--job-dir',
      help='GCS location to write checkpoints and export models',
      required=True
  )

  # Argument to turn on all logging
  parser.add_argument(
      '--verbosity',
      choices=[
          'DEBUG',
          'ERROR',
          'FATAL',
          'INFO',
          'WARN'
      ],
      default='INFO',
  )
  # Experiment arguments
  parser.add_argument(
      '--train-steps',
      help="""\
      Steps to run the training job for. If --num-epochs is not specified,
      this must be. Otherwise the training job will run indefinitely.\
      """,
      type=int
  )
  parser.add_argument(
      '--eval-steps',
      help='Number of steps to run evalution for at each checkpoint',
      default=100,
      type=int
  )
  parser.add_argument(
      '--export-format',
      help='The input format of the exported SavedModel binary',
      choices=['JSON', 'CSV', 'EXAMPLE'],
      default='JSON'
  )

  args = parser.parse_args()

  # Set python level verbosity
  tf.logging.set_verbosity(args.verbosity)
  # Set C++ Graph Execution level verbosity
  os.environ['TF_CPP_MIN_LOG_LEVEL'] = str(
      tf.logging.__dict__[args.verbosity] / 10)

  # Run the training job
  hparams=hparam.HParams(**args.__dict__)
  run_experiment(hparams)

在 Tensorflow 2.x 中,使用 Keras 编写一个新的导出签名,该签名采用原始输入和密钥。请注意,您必须适当地定义原始输入的形状

@tf.function(input_signature=[tf.TensorSpec([None, 1], dtype=tf.float32), tf.TensorSpec([None, 1], dtype=tf.int32)])
def keyed_prediction(originput, key):
    pred = model(originput, training=False)
    return {
        'price': pred,
        'key': key
    }

model.save(EXPORT_PATH, signatures={'serving_default': keyed_prediction})

在 Tensorflow 1.x 中修改导出签名:

    config = estimator.config
    def model_fn2(features, labels, mode):
      estimatorSpec = estimator._call_model_fn(features, labels, mode, config=config)
      if estimatorSpec.export_outputs:
        for ekey in ['predict', 'serving_default']:
          estimatorSpec.export_outputs[ekey] = \
            tf.estimator.export.PredictOutput(estimatorSpec.predictions)
      return estimatorSpec
    return tf.estimator.Estimator(model_fn=model_fn2, config=config)

See: https://towardsdatascience.com/how-to-extend-a-canned-tensorflow-estimator-to-add-more-evaluation-metrics-and-to-pass-through-ddf66cd3047d

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

定义云机器学习预测的实例键(索引号) 的相关文章

随机推荐

  • swift - if 语句和数组

    我想将 DicX 中的字符串与现有标题 根据单元格选择而变化的表格标题 进行匹配 var DicX xx yy zz qq let DicYY 11 22 33 44 1 2 3 4 m n k b bb kk mm nn 我正在比较的标题
  • 从可能未格式化为表格的网页中提取数据

    首先 我绝不是 VBA 专家 只要知道得足够多就会很危险 8 我首先搜索了如何从网页中提取表格 发现很多人都问了同样的问题 不幸的是 我读到的大部分内容都超出了我的理解范围 我读过的一篇文章向我指出了这一点详细文章作者 Siddharth
  • asp.net MVC 有应用程序变量吗?

    我正忙于将 Web 应用程序转换为 MVC 并将一些信息保存到跨多个租户 帐户使用的应用程序变量中 以提高效率 我意识到 MVC 的要点是尽可能保持无状态 会话状态显然在 MVC 中具有并存在是有意义的 但我们不想只将应用程序转换为会话变量
  • 如何使用支持 __LINE__ 和 __FILE__ 的内联函数替换 C++ 异常宏?

    我目前正在阅读 Scott Meyers 的 Effective C 一书 它说我应该更喜欢inline功能超过 define对于类似函数的宏 现在我尝试编写一个内联函数来替换我的异常宏 我的旧宏看起来像这样 define EXCEPTIO
  • 所有边的edge_index都为零?

    定义我的boost graph如下所示 我得到所有边的边索引为零 为什么 我究竟做错了什么 include
  • 在表单完成之前不要更改 QTabWidget 的 TAB

    我试图让用户在填写表格 1 之前不要切换到 表格 2 所在的下一个选项卡 我尝试了 currentChange 事件 但它没有按照我想要的方式工作 因为它在已经从 TAB 更改时显示警报 有没有办法让当前的 TAB 保持固定 直到任务完成
  • 接受应用程序时选择“现在不”会导致“com.facebook.sdk error2”

    使用 Facebook iOS SDK 3 1 当选择不允许 连接 到 Facebook 应用程序时 我陷入了 Facebook 抛出 com facebook sdk error2 的困境 即使重新安装我的应用程序后 我也会遇到错误 重现
  • Java.io.IOException:无效状态代码 = 403 文本 = 禁止

    当我尝试在远程服务器的 Azure DevOps 驱动程序中执行一组 Selenium Webdriver 测试时 出现此错误堆栈 2023 03 08T21 06 46 9827484Z Running Test 66728 Mobile
  • 根据图的边对应的分数

    import numpy as np score np array 0 9 0 7 0 2 0 6 0 4 0 7 0 9 0 6 0 8 0 3 0 2 0 6 0 9 0 4 0 7 0 6 0 8 0 4 0 9 0 3 0 4 0
  • 如何使用 PaintEventArgs 参数调用函数?

    给出来自 MSDN 的以下代码示例 private void GetPixel Example PaintEventArgs e Create a Bitmap object from an image file Bitmap myBitm
  • 如何动态获取EC 2的私有IP并将其放入/etc/hosts

    我想使用 Terraform 创建多个 EC2 实例并将实例的私有 IP 地址写入 etc hosts在每个实例上 目前我正在尝试以下代码 但它不起作用 resource aws instance ceph cluster count va
  • 授予 PHP 访问 COM 端口的权限

    我正在创建一个 php 脚本 该脚本连接到通过 COM5 上的串行连接连接的 3G 调制解调器 我收到以下错误 我相信这是因为 php 没有对 COM5 的读 写访问权限 警告 fopen COM5 function fopen 无法打开流
  • 通过 PowerShell 填写 Web 表单无法识别输入的值

    作为 QA 我需要通过网络表单填写很多申请 想法是将个人数据保存在某些 xls txt 任何文件中 读取该文件并使用 Powershell 将数据提供给浏览器 当我使用下面的代码在 IE 中填写表单时 尽管看起来工作正常 但在提交表单时出现
  • Javascript正则表达式解析路径字符串

    我有一个向用户显示照片和相册的应用程序 根据应用程序的当前状态 我显示了适当的视图 每次视图更改时 我都会更改 url 然后控制器使用 window location hash 获取 url 值 它返回以下形式的字符串 photos bya
  • 将矩阵元素映射到字符串

    我想将矩阵的数字输出映射到字符串 Given compute 7 4 3 3 4 7 如何获得字符串映射为 Out Run Walk Jog Jog Walk Run 实际输出可能是字符串元胞数组 gt gt map a b Jog Wal
  • 从 SQL 2008 中的外键关系生成删除语句?

    是否可以通过脚本 工具根据表 fk 关系生成删除语句 即 我有表 DelMe ID 并且有 30 个表 其中有对其 ID 的 FK 引用 我需要首先删除这些表 是否有一些我可以运行的工具 脚本 它将根据 FK 关系生成 30 个删除语句为我
  • Process.WaitForExit() 触发速度太快

    以下是我用来从 C 代码运行 extern 可执行文件 非托管 的代码 static void Solve Process newProc new Process newProc StartInfo WorkingDirectory Pat
  • 从用户数据创建 OpenCV Mat 会产生具有循环移位列的图像

    我有一个从文件加载的图像 是 png 吗 我将其转换为一维数组 以便通过指向数组的指针在函数中使用 当我从 1D 指针创建 Mat 时 生成的图像看起来像是采用了最右侧的十几列 并将它们放在图像的左侧 几乎就像列的循环移位 SAMPLE C
  • 500 内部服务器错误 v3 上传 API:GoogleJsonResponseException

    在使用 Java 的 Youtube API 示例将文件上传到 YouTube 时 我得到以下信息 行错误 视频返回Video videoInsert execute 错误堆栈 com google api client googleapi
  • 定义云机器学习预测的实例键(索引号)

    我遵循了 入门 云机器学习引擎教程并进行部署 我可以将包含 JSON 实例的输入文件传递给批量预测服务 它会返回包含预测的文件 如何通过应用程序图不改变地传递实例键 索引号 以便预测包含该键 并且我知道哪个 JSON 预测属于哪个 JSON